Base-calling of automated sequencer traces using phred. II. Error probabilities.

نویسندگان

  • B Ewing
  • P Green
چکیده

Elimination of the data processing bottleneck in high-throughput sequencing will require both improved accuracy of data processing software and reliable measures of that accuracy. We have developed and implemented in our base-calling program phred the ability to estimate a probability of error for each base-call, as a function of certain parameters computed from the trace data. These error probabilities are shown here to be valid (correspond to actual error rates) and to have high power to discriminate correct base-calls from incorrect ones, for read data collected under several different chemistries and electrophoretic conditions. They play a critical role in our assembly program phrap and our finishing program consed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Base-calling of automated sequencer traces using phred. I. Accuracy assessment.

The availability of massive amounts of DNA sequence information has begun to revolutionize the practice of biology. As a result, current large-scale sequencing output, while impressive, is not adequate to keep pace with growing demand and, in particular, is far short of what will be required to obtain the 3-billion-base human genome sequence by the target date of 2005. To reach this goal, impro...

متن کامل

Genome Sequencing and Bioinformatics Analyses of Higher Plants Chloroplasts

Chloroplast DNA in higher plants exist as closed circular molecules of about 150 kb (±30), usually presenting inverted repeat sequences separating two single copy regions [1]. It is available the complete chloroplast genomes of around 13 higher plants species available in the gene bank. Our group has completely sequenced the sugarcane chloroplast DNA which is 141182 nucleotides in size. We have...

متن کامل

Dna Sequences Base Calling by Phred: Error Pattern Analysis

PHRED is the most frequently used base caller algorithm in genome projects. An interesting point on PHRED utilization is the fact that a low score on some base may not actually correspond to a miscalling on that base, but it may stand for a putative error on the region around this base. In order to evaluate the efficiency of PHRED on base calling and base quality assigning, we have sequenced pU...

متن کامل

Improving the Accuracy of Base Calls and Error Predictions for GS 20 DNA Sequence Data

New DNA sequencing technology implemented in the GS 20 sequencer reduces cost and time in exchange for lower accuracy. DNA sequencing errors negatively impact downstream applications and therefore accurate base calls and error probabilities are invaluable to researchers. This paper applies a graphical model to the base calling problem in context of the GS 20 sequencer. This model integrates sig...

متن کامل

PhredEM: a phred-score-informed genotype-calling approach for next-generation sequencing studies.

A fundamental challenge in analyzing next-generation sequencing (NGS) data is to determine an individual's genotype accurately, as the accuracy of the inferred genotype is essential to downstream analyses. Correctly estimating the base-calling error rate is critical to accurate genotype calls. Phred scores that accompany each call can be used to decide which calls are reliable. Some genotype ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genome research

دوره 8 3  شماره 

صفحات  -

تاریخ انتشار 1998